Named Entity Discovery Using Comparable News Articles

نویسندگان

  • Yusuke Shinyama
  • Satoshi Sekine
چکیده

In this paper we describe a way to discover Named Entities by using the distribution of words in news articles. Named Entity recognition is an important task for today’s natural language applications, but it still suffers for its data sparseness. We used an observation that a Named Entity often appears synchronously in several news articles, whereas a common noun doesn’t. Exploiting this characteristic, we successfully obtained rare Named Entities with 90% accuracy just by comparing time series distributions of two articles. Although the achieved recall is not sufficient yet, we believe that this method can be used to strengthen the lexical knowledge of a Named Entity tagger.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora

In this paper, we address the problem of mining transliterations of Named Entities (NEs) from large comparable corpora. We leverage the empirical fact that multilingual news articles with similar news content are rich in Named Entity Transliteration Equivalents (NETEs). Our mining algorithm, MINT, uses a cross-language document similarity model to align multilingual news articles and then mines...

متن کامل

Named Entity Oriented Difference Analysis of News Articles and Its Application

To support the efficient gathering of diverse information about a news event, we focus on descriptions of named entities (persons, organizations, locations) in news articles. We extend the stakeholder mining proposed by Ogawa et al. and extract descriptions of named entities in articles. We propose three measures (difference in opinion, difference in details, and difference in factor coverage) ...

متن کامل

Searching for Diverse Perspectives in News Articles: Using an LSTM Network to Classify Sentiment

When searching for emerging news on named entities, many users wish to find articles containing a variety of perspectives. Advances in sentiment analysis, particularly by tools that use Recurrent Neural Networks (RNNs), have made impressive gains in their accuracy handling NLP tasks such as sentiment analysis. Here we describe and implement a special type of RNN called a Long Short Term Memory ...

متن کامل

DC Proposal: Model for News Filtering with Named Entities

In this paper we introduce the project of our PhD thesis. The subject is a model for news articles filtering. We propose a framework combining information about named entities extracted from news articles with article texts. Named entities are enriched with additional attributes crawled from semantic web resources. These properties are then used to enhance the filtering results. We described va...

متن کامل

Extracting Named Entities Using Named Entity Recognizer and Generating Topics Using Latent Dirichlet Allocation Algorithm for Arabic News Articles

This paper explains for the Arabic language, how to extract named entities and topics from news articles. Due to the lack of high quality tools for Named Entity Recognition (NER) and topic identification for Arabic, we have built an Arabic NER (RenA) and an Arabic topic extraction tool using the popular LDA algorithm (ALDA). NER involves extracting information and identifying types, such as nam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004